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ABSTRACT 

At present the most popular criteria of informative heuristic criteria are associated with the estimation of 
separability given classes and based on the fundamental pattern recognition compactness hypothesis: with an increasing 
distance between the classes improved their separability. "Good" are those features that maximize the distance. 

Such heuristic criteria, although are widely used in solving practical problems of classification, but in theory 
are scarcely explored. 
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INTRODUCTION 

Formation of the feature space in problems of classification can be divided into two stages: selection of 
the initial description of objects and the form of an informative description of the objects on the basis of reducing 
the dimension of the initial description of the space. 

The first stage selects the original characteristics of the system, useful in varying degrees, to separate a 
predetermined alphabet of images, which can obtain a priori information necessary to describe the language of 
image features. This step is the least developed in the data analysis problems, where there are currently no 
formalized methods of its implementation. When determining the initial signs of a system widely used prior 
knowledge, intuition and experience of the subject field. In this case, you should also take into account the 
important fact related to the fact that every real object is an infinite number of different properties, reflecting his 
hand. Naturally, in each case they are not all essential properties, but only a limited set of their defining features 
really allows classification. Finding such features always requires careful examination of the content essence of 
classified objects using experimental data on the properties of the objects under consideration. To solve this 
problem may be useful software, data analysis tools, such as exploratory analysis tools, knowledge discovery and 
verification of various systems of signs. In this case, great help can provide structural data processing methods, in 
which we study the structural relationship of geometric configuration point objects in multidimensional space 
descriptions. Analysis of the data structure helps the researcher understand what properties of objects contribute to 
the separation of the images, to evaluate the information content of the individual features. 
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The assigning of the second stage of the above - to determine the most useful to classify a set of attributes of the 
objects. Necessity the implementation of this phase is due to the following circumstance. 

When the initial system features selected, it usually turns out to be quite excessive. There are arguments "for" and 
"against" on the conservation of such redundancy. The argument "for" is that increasing the number of features allows 
more fully describe the objects. The argument "against": an increase in the number of features increases the "noise" in the 
data makes it difficult to process and leads to additional time for its implementation. 

Consequently, the argument "for" comes mainly from the statistical 'preconditions, while the argument "against" 
is pushed mainly by non-statistical. If practical motives are almost always important, the conditions "when running 
statistics are executed much less frequently than expected. In [1] includes such criteria for the applicability of statistical 
methods: 

• You can repeatedly repeat the experiment in the same conditions; 

• It is impossible to predict the outcome of the experiment due to the influence of a large number of random factors; 

• With an increasing number experimental results converge to certain values. 

Moreover, the authors of [1] pointed out that the strict mathematical methods to verify the fulfillment of these 
conditions, in a particular case does not exist. They secrete sociology, demography, theory of reliability and quality as the 
sampling area, where these conditions are fulfilled in most cases. Very often, however, they are violated - wholly or partly 
- usually due to the fact that it is not carried out the second part of claim 1 criterion, i.e. not complied with the same 
experimental conditions. 

In connection with the search for an answer to the question: how many objects should be taken in compliance with 
the conditions of a statistical ensemble and how the signs should be measured fin terms of statistics and not a domain) to 
produce a result with a given accuracy, it is advisable to refer to the results of studies on the evaluation of recognition 
errors at different teaching sample volumes m and the number of signs of N [1-5]. You can draw the following conclusions: 

• The error increases rapidly with an increasing number signs of N and slowly decreases with increasing number of 

objects m; 

• Increasing the number of signs requires a significant increase in the volume of teaching sample to achieve the 

same error. 

Therefore, the paramount importance to choose the number of signs is to play non-statistical considerations 
arising from the nature of the problem being solved and the features of the subject area. Only when performing the static 
ensemble of conditions that are usually very difficult to check, you can be guided by the findings of the statistical evidence 
of the required amount to ensure the accuracy of the result. 

When the classification process is realized in the conditions of small volume of training sample, the reduction of 
the dimension of the original feature space becomes crucial. Usually, such a conversion feature space is reduced to the 
determination of a relatively small number of features that have the greatest information content in accordance with the 
selected criterion. 
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In general, speaking about the transformation of the feature space and the choice of informativeness criterion, it 
should be borne in mind that signs of transformation, carried out due to the classification of quality, resulting in the 
problem of representation of the original data in the space of smaller dimension. Thus obtained is determined by a set of 
features optimized some functions of criteria ignores the divide objects into classes. If the symptoms are selected to 
improve the performance of classifying system, the criterion for this choice is linked to the separability of classes. In 
accordance with these goals, two approaches to reduce the original space of dimension attributes are commonly used in 
applied research. 

In the first approach, new features are determined without regard to the classification of the quality - the task of 
presenting data. This problem occurs when processing large amounts of information when necessary to replace the original 

system features X = (x 1 ,...,X N ) a set of auxiliary variables significantly smaller dimension 
z(x) = (x)), ( 1<N ). According to [2], it is the most accurate recovery (mxN) values of the initial 

features x) , Xj...,xf at a substantially smaller number of (mxl) values of auxiliary variables z*. ,.z 2 jz'j ; j = 1 ,m, 

where m - number of objects in the given sample. If such a replacement is possible, then it leads to this problem of 
representation of the original data in the space of smaller dimension. 

In the second approach search attributes associated with the evaluation classification quality. In this case, the 
specification of the feature space is performed, that is the definition of a set of informative signs, which are selected to 
adequately address the problem of classification. 

It is the development of an approach based on the use of heuristic criteria of the informativeness of the symptoms 
associated with the evaluation of separability of classes given training sample, the subject of this article. 

STATEMENT OF A PROBLEM AND THE CONCEPT 

OF THE PROBLEM DECISION 


Considered below the informativeness criterion, being a heuristic based on an assessment of the measures of 
separability of objects given training sample using the Euclidean metric. 

For example, the teaching sample sets the object x n ,x 12 ,...,x lmi ,x 2l ,x 22 ,...,x 2 ,^,...,x rl ,x r2 ,...,x rm ,ioi which it is 

known that each group of objects X , ,X X belongs to a particular class X , p = 1, r. 

P 1 P ^ P m p 


Each object 


X 


pi 


i an N-dimensional vector of numeric attributes, i.e.. 


( 1 2 n\ 

X . = lx .,X ) 

pi \ pi 7 pi 7 7 pi / 


For a given training sample objects X pl ,X pl ,...,X pm G X p , p = l,r, where X pj -the vector in the N- 

dimensional feature space, we introduce the vector A = (A , A ",..., A"' ) , A' e {0; l}, k = 1 , N , which, as noted in the 

previous section, uniquely characterizes the particular subsystem features. The vector components A equal to one, indicate 
the presence of the relevant signs in this subsystem, and zero components show no signs of the relevant. 

Space of features {x = (x *, X 2 ,..., X N )} will assume Euclidean and is denoted by R N . 
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Definition 1: Truncation of space R N — | X — (x', X 2 ,..., X '' ) }on the /Icall space 

n»| / I ( 11 1 12 2 1W AT "it 

a ={x\ = [Ax ,A x A x Jj. 


By truncating the distance between two objects X, ye A*' we mean Euclidean distance X , V in R 

A A 


i.e. 


lk-rL=JZT(T-/ v 

V k =1 


r 


Definition 2: A vector Ais £ - informative, if the sum of the components is equal £ , i.e. 

N 


i=l 


A 

For each subsystem, a given f 1 -informative vector ’ defined its £ -dimensional attributive subspace. In each of 

A 

these spaces, we introduce some norms regarding the truncation on ’ for simplicity choose Euclidean norm: 


14=Ji^vr 


7=1 


We denote 


m „ i =i 


T x Pi’ p =t 


Where X P - the average object of a class X p . 


We introduce the function 


\uh -i 


m i =i 

p 


X . - Xr 

pi l 


The function S p (A) describes the average spread of objects of the class X p in the subset of features defined by 
the vector A. We define criteria of the informativeness content in the form of functional subsystems 


I x (A)=^- 


Xn ~ Xc 


±s 2 M) 

p =1 


(1) 


This functional is a generalization of Fisher's functional [3]. We denote 
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a 


= (a\a 2 ,...,a N ) b = (b l ,b 2 ,...,b N ) 




7=1,^V; 


P.«=1 




/ 7=1 


m n 
V p 


y — j 

,X pi X p 


i =1 


7 = 1,1V. 


( 2 ) 


Then the functional (1) reduces to the form 



(a. A) 

(M)’ 


Where (*,*) - scalar product of vectors. 

The coefficients d J , b J do not depend on 
for each A required order N operations. 


A, and are calculated in advance. To calculate the functional /(/l) 


Next criteria given by the form of the functional (2) is called the Fisher criterion of the informativeness and 
designate it as /, (A). This criterion was studied in [2,5], where it identified particularly evaluated the efficacy and 
proposed methods for selection of informative features based on the maximization of the functional (2). 

Were developed a lot of methods for determining the set of informative features of on the basis of a simple type of 
Fisher criterion. One of them is the method of "Orderings", this method does not always provide the best solution against 
Fisher criterion. 


For example: For a = (5,10,10, l), b = (l, 50,50,19) h N = 4, £ = 2 an optimal solution is 
A = (l,0,0,l)and a vector, the vector A = (l, 1,0,0) is not an optimal solution. 


The following are optimal conditions for the "Orderings" of the method. 
Consider the following optimization problem: 

T (n\ {a,A) 

<Ae A',A ={0,1 },i = lJV, 
a,be , a. > 0 ,b. > 0 ,i- 1 ,N, 


Where K - the /-dimensional information space of features: 


i = l,N,f^A, =/j 



(3) 
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The main aim is to determine when the method of "Orderings" of against vectors a and b are gives the optimal 


solution of the problem (3). 

Let there be given real numbers a,b ea C>0,d >0 (a + C > 0, b + d > 0) . Then we have one of the following 

lemmas: 


f a > 0 c a a a + c c 

Lemma 1: If < n — > —, then the following relation holds — <-< —. 

\b>0 d b b b + d d 


\ a > 0 c a a a + c c 

Lemma 2: If , h — < —, then the following relation holds — >-> —. 

}b>0 d b b b + d d 


\ a <0 c a a a + c c 

Lemma 3: If , h — < —, then the following relation holds — >-< —. 

}b< 0 d b b b + d d 


f a < 0 c a a a + c c 

Lemma 4. If , h — > —, then the following relation holds — <-> —. 

Z ?<0 d b b b + d d 


fa > 0 

Lemma 5: If s , then the following relation holds 

\b< 0 


a + c c 
b + d d 


fa <0 

Lemma 6: If < , then the following relation holds 

[ 6>0 


a + c c 
b + d d 


As proof of the above lemma is very simple, not shown. 
We introduce the following notation: 


A = Yl l n B = IP, 


pH =a J ~ a > 


A° = 


AZr. =b j -b i ,i=\l,j =l+\N 




N-l 


If in the above lemmas adopted a = Aa ^, b = Ab ;j ,c = A,d = B , then, for V/, j (/ = 1 j = l +1, n), taking 


lA + Acijj > 0 , 

into account < to take place one of these lemmas. 

[B + Abjj > 0 


In many cases, pre-selected vector A can provide the optimal solution of problem (3). Therefore the following 
theorem to determine the conditions under which this can happen 

Let the chosen G ^ . 

Theorem 1: To the chosen vector A is provided the optimal solution of problem (3) if and only if the lack of 
a = Aa tj and b = A b fj {i = 1,1, j = l + 1, n') , satisfying the conditions of Lemmas 2, 4 and 5. 
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Proof: 


A* = {a,A) = '£ j a i A i , 

i =i can be written as 

B* ={b,X)=f j b i X I 

i =1 

A* = A + £ 

f = l 

B‘ = B + f,Ab ( f 


Adequacy: Let the selected \Z A € A'. Then, the expression 


To preserve the / informativeness of the vector A,f and k are defined as follows: 

• If Aj = 0 and Aj = 1, then / = j and k = i (i = 1, l, j = l + 1, /V ). 

• If ^ ^ n Aj = 0, then / = i and k = j (i = 1,1, j = l + 1 , Af). 


For the A and 5 correspondingly following equality holds 

j A = A + A, + A 2 + A 3 + A 4 + A 5 + A 6 

^5 = B + + B 2 + 5 3 + Z? 4 + 

where A k and B k - sum of Aa ;; and Ab lf , satisfying the conditions of k-lemmas (k = 1, 6 

From the lemmas 6 it follows A + A + A + A 4 + A 5 + A 6 < + A 2 + A,+ A 4 + A 5 _ 

+ Z?2 + 5 3 + + B 5 + j£? 6 ^3 "I” B 4 

By conditions of the theorem sums A 2 , B, A 4 , B 4 and A 5 , B 5 are zero. 


| A* = A + Aj + A 3 
[B* =B + B l +B i 


Every corresponding elements of the set A 3 and B 3 


are satisfy the conditions of Lemma 3, then from 


Ja 3 <o 

|b 3 < 0 


follows 


A A 3 
— < — 
B B 3 


. From Lemma 3 we obtain 


A + A, A 

-- < — 

B + fi 3 B 


Every corresponding elements of the set A ] and B t are satisfy the conditions of Lemma 1, then from 


(a, > 0 
lA> 0 


A > A, 

follows B 


. From Lemma 3 we obtain 


A + Aj A 
B + B l B 
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From 


A A, 

— < — 

B B 3 


and 


A +A, A 

- L < — 

B + B l B 


follows 


A + A, A A, 

-L < — < —4 

B + B l B B 3 

K <0 A + A, A 3 

So s , then from-< —— and lemma -3 we obtain 

\ B 3 < 0 B + B x B } 

A + Aj + A 3 A + Aj 
B + B l + B 3 B + B l 


(4) 


(5) 


From (7) and (8) follows 


A + A, + A, A 

- - -- < — . 

B + B j + B 3 B 


Adequacy: Suppose there Act tj and A/? j( satisfying the conditions of Lemma 2 and Lemma 4, for the result of 


A + At + A, A 

Lemma 2 and Lemma 4 we have ---> — . 

B + B-, + B 4 i? 


Every corresponding elements of the set A f and B { are satisfy the conditions of Lemma 2, then from 


|a 2 >o 
K>0 


A At A + A 3 A 

follows — < ——. From Lemma 2 we obtain -> — . 

B B-, B + B^ B 


Every corresponding elements of the set A 4 


and B 4 are satisfy the conditions of Lemma 4, then from 


|A 4 <0 

K<o 


follows — > — 

B B. 


. From Lemma 4 we obtain 


A + A 4 A 
B + B 4 B 


A At A A 4 

— < — — > — 

From B h B B 4 £ q jj ows 


A 4 < A ^ A 2 

b 4 B b 2 


(6) 


So 


Ja 4 <0 

K<o 


, then from-< 

B. 


A 

b 2 


and lemmas -4 we obtain 


A ? + A 4 At A 

-A-1 > — > — (7) 

5 2 +i? 4 B 2 B 
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So 


|A>0 A 2 +A 4 ^ A 

; then from ^ and lemmas -1 we obtain 


At + A, A + At + A 4 A 

—-^ >-- - - > — 

B-, + B 4 B + B 2 + i? 4 B 


Every corresponding elements of the set A 5 and B 5 are satisfy the conditions of Lemma 5, then 


A + A 9 + A 4 + A 5 A + A 2 + A 4 A 
B + /f T + fi 4 + B + B n + fi 4 5 


From 


A + A, + A,, + Ac 


B + B 2 + B 4 + B 


> — we have that the vector /l corresponding to the value /(/l) = — is not optimal. 


5 


If the vector X is not an optimal solution of the problem (3), we carried out a replacement on the basis of 
Lemmas 2, 4 and 5. 

The replacement process is continued until the exhausted A a { . and A/?. ; are satisfying the conditions of Lemmas 
2, 4 and 5, and at the same time, in accordance with Theorem 2 the solution found is optimal. 

In this method, the functional values and the components of vector X are determined as follows. 

Let us suppose, for A a- and AA one of the lemmas 2, 4 and 5 is true. In this case, in accordance with these 


lemmas 


A + A u ■ 


> — and the relation values of the components i and j of vector X are changing mutually. 


B + A b tj B 

Process of successive interchange continues until, until you have the conditions of Theorem 1. 

This method is called the method of "Delta-2" and implemented by the algorithm, denoted as A 2 and represented 
as follows: 

Step 1: Is accepted 

X = {^h^l, 0 , 0 ,... 0 } 

Step 2: Calculation of the values A and B, i.e. A = (a, X) , B = (b, X). 

Step 3: Implementation of the appropriation i = 1, j = N ; A { = A, B { — B . 


A a Ab 

Step 4: The calculation of values ,] and ,J . 

Step 5: Check the conditions of lemma 4. If A a- and A/?- satisfy these conditions, the values of ;-th and /-th 
component of the vector X interchanged and after calculation A = A + A<3^ ,6 = B + A/? (j , you can skip to step 9, 
otherwise - go to the next step. 
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Step 6: Check the conditions of Lemma 2. If A a and Ab not satisfy the conditions of the lemma, the values of 

U ’J 

;-th and y-th component of the vector A interchanged and after calculation A = A + Act (/ ,B = B + A/:r , you can skip to 
step 9, otherwise - go to the next step. 

Step 7: Check the conditions of Lemma 5. If A a and Ab not satisfy the conditions of the lemma, the values of 

y y 

;-th and j-th component of the vector A interchanged and after calculation A = A + Act .. ,B = B + A/:r , you can skip to 
step 9, otherwise - go to the next step. 

Step 8: Checking the condition j > £ . If it holds, then carried out the assignment j = j — 1 and go to step 5, 
otherwise - go to the next step. 

Step 9: Checking the condition i < £ . If it holds, then carried out the assignment i = i +1 and go to step 5, 
otherwise - go to the next step. 

Step 10: Checking the condition A { = A and B 1 = B. If they are performing, then the vector A is the best 
solution and the process ends, otherwise, a transition to step 3. 

CONCLUSIONS 

In the article, it is determined that the optimal conditions for the method of "ordering" and the selected vector. 
With the help of these theorems we developed a new method for selecting informative features using heuristic criteria of 
Fisher type. 
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